Search CORE

5 research outputs found

App Review Driven Collaborative Bug Finding

Author: Bissyande Tegawendé F.
Klein Jacques
Kong Pingfan
Liu Kui
Tang Xunzhu
Tian Haoye
Publication venue
Publication date: 23/01/2023
Field of study

Software development teams generally welcome any effort to expose bugs in their code base. In this work, we build on the hypothesis that mobile apps from the same category (e.g., two web browser apps) may be affected by similar bugs in their evolution process. It is therefore possible to transfer the experience of one historical app to quickly find bugs in its new counterparts. This has been referred to as collaborative bug finding in the literature. Our novelty is that we guide the bug finding process by considering that existing bugs have been hinted within app reviews. Concretely, we design the BugRMSys approach to recommend bug reports for a target app by matching historical bug reports from apps in the same category with user app reviews of the target app. We experimentally show that this approach enables us to quickly expose and report dozens of bugs for targeted apps such as Brave (web browser app). BugRMSys's implementation relies on DistilBERT to produce natural language text embeddings. Our pipeline considers similarities between bug reports and app reviews to identify relevant bugs. We then focus on the app review as well as potential reproduction steps in the historical bug report (from a same-category app) to reproduce the bugs. Overall, after applying BugRMSys to six popular apps, we were able to identify, reproduce and report 20 new bugs: among these, 9 reports have been already triaged, 6 were confirmed, and 4 have been fixed by official development teams, respectively

arXiv.org e-Print Archive

Characterizing malicious Android apps by mining topic-specific data flow signatures

Author: BISSYANDE Tegawendé F.
KLEIN Jacques
LI Li
LO David
XIA Xin
YANG Xinli
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

Context: State-of-the-art works on automated detection of Android malware have leveraged app descriptions to spot anomalies w.r.t the functionality implemented, or have used data flow information as a feature to discriminate malicious from benign apps. Although these works have yielded promising performance, we hypothesize that these performances can be improved by a better understanding of malicious behavior. Objective: To characterize malicious apps, we take into account both information on app descriptions, which are indicative of apps’ topics, and information on sensitive data flow, which can be relevant to discriminate malware from benign apps. Method: In this paper, we propose a topic-specific approach to malware comprehension based on app descriptions and data-flow information. First, we use an advanced topic model, adaptive LDA with GA, to cluster apps according to their descriptions. Then, we use information gain ratio of sensitive data flow information to build so-called “topic-specific data flow signatures”. Results: We conduct an empirical study on 3691 benign and 1612 malicious apps. We group them into 118 topics and generate topic-specific data flow signature. We verify the effectiveness of the topic-specific data flow signatures by comparing them with the overall data flow signature. In addition, we perform a deeper analysis on 25 representative topic-specific signatures and yield several implications. Conclusion: Topic-specific data flow signatures are efficient in highlighting the malicious behavior, and thus can help in characterizing malware

Institutional Knowledge at Singapore Management University

Open Repository and Bibliography - Luxembourg

Augmenting and structuring user queries to support efficient free-form code search

Author: BISSYANDE Tegawendé F.
KIM Dongsun
KIM Kisub
KLEIN Jacques
LO David
SIRRES Raphael
TRAON Yves Le
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Source code terms such as method names and variable types are often different from conceptual words mentioned in a search query. This vocabulary mismatch problem can make code search inefficient. In this paper, we present Code voCABUlary (CoCaBu), an approach to resolving the vocabulary mismatch problem when dealing with free-form code search queries. Our approach leverages common developer questions and the associated expert answers to augment user queries with the relevant, but missing, structural code entities in order to improve the performance of matching relevant code examples within large code repositories. To instantiate this approach, we build GitSearch, a code search engine, on top of GitHub and StackOverflow Q\&A data. We evaluate GitSearch in several dimensions to demonstrate that (1) its code search results are correct with respect to user-accepted answers; (2) the results are qualitatively better than those of existing Internet-scale code search engines; (3) our engine is competitive against web search engines, such as Google, in helping users complete solve programming tasks; and (4) GitSearch provides code examples that are acceptable or interesting to the community as answers for StackOverflow questions

Crossref

Institutional Knowledge at Singapore Management University

Open Repository and Bibliography - Luxembourg

DexBERT: Effective, task-agnostic and fine-grained representation learning of Android bytecode

Author: ALLIX Kevin
BISSYANDE Tegawendé F.
KIM Dongsun
KIM Kisub
KLEIN Jacques
LO David
SUN Tiezhu
ZHOU Xin
Publication venue: Institute of Electrical and Electronics Engineers
Publication date: 01/10/2023
Field of study

Institutional Knowledge at Singapore Management University

DigBug: Pre/post-processing operator selection for accurate bug localization

Author: BISSYANDE Tegawendé F.
GHATPANDE Sankalp
KIM Dongsun
KIM Kisub
KLEIN Jacques
KOYUNCU Anil
LE TRAON Yves
LIU Kui
Publication venue: 'Elsevier BV'
Publication date: 01/07/2022
Field of study

Bug localization is a recurrent maintenance task in software development. It aims at identifying relevant code locations (e.g., code files) that must be inspected to fix bugs. When such bugs are reported by users, the localization process become often overwhelming as it is mostly a manual task due to incomplete and informal information (written in natural languages) available in bug reports. The research community has then invested in automated approaches, notably using Information Retrieval techniques. Unfortunately, reported performance in the literature is still limited for practical usage. Our key observation, after empirically investigating a large dataset of bug reports as well as workflow and results of state-of-the-art approaches, is that most approaches attempt localization for every bug report without considering the different characteristics of the bug reports. We propose DigBug as a straightforward approach to specialized bug localization. This approach selects pre/post-processing operators based on the attributes of bug reports; and the bug localization model is parameterized in accordance as well. Our experiments confirm that departing from ‘‘one-size-fits-all’’ approaches, DigBug outperforms the state-of-the-art techniques by 6 and 14 percentage points, respectively in terms of MAP and MRR on average

Institutional Knowledge at Singapore Management University

Sabanci University Research Database

Open Repository and Bibliography - Luxembourg